AITopics

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-26-2025, 23:08:36 GMT

Unsupervised Anomaly Detection with Rejection

Anomaly detection aims at detecting unexpected behaviours in the data. Because anomaly detection is usually an unsupervised task, traditional anomaly detectors learn a decision boundary by employing heuristics based on intuitions, which are hard to verify in practice. This introduces some uncertainty, especially close to the decision boundary, that may reduce the user trust in the detector's predictions. A way to combat this is by allowing the detector to reject predictions with high uncertainty (Learning to Reject). This requires employing a confidence metric that captures the distance to the decision boundary and setting a rejection threshold to reject low-confidence predictions. However, selecting a proper metric and setting the rejection threshold without labels are challenging tasks.

name change, rejection threshold, unsupervised anomaly detection, (5 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Valeriano, Maria Gabriela, Marzagão, David Kohan, Montelongo, Alfredo, Kiffer, Carlos Roberto Veiga, Katz, Natan, Lorena, Ana Carolina

Filtering instances and rejecting predictions to obtain reliable models in healthcare

arXiv.org Artificial IntelligenceOct-29-2025

Machine Learning (ML) models are widely used in high-stakes domains such as healthcare, where the reliability of predictions is critical. However, these models often fail to account for uncertainty, providing predictions even with low confidence. This work proposes a novel two-step data-centric approach to enhance the performance of ML models by improving data quality and filtering low-confidence predictions. The first step involves leveraging Instance Hardness (IH) to filter problematic instances during training, thereby refining the dataset. The second step introduces a confidence-based rejection mechanism during inference, ensuring that only reliable predictions are retained. We evaluate our approach using three real-world healthcare datasets, demonstrating its effectiveness at improving model reliability while balancing predictive performance and rejection rate. Additionally, we use alternative criteria - influence values for filtering and uncertainty for rejection - as baselines to evaluate the efficiency of the proposed method. The results demonstrate that integrating IH filtering with confidence-based rejection effectively enhances model performance while preserving a large proportion of instances. This approach provides a practical method for deploying ML systems in safety-critical applications.

artificial intelligence, dataset, machine learning, (19 more...)

2510.24368

Country: South America > Brazil (0.46)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsAug-21-2025, 21:39:11 GMT

Agnostic Active Learning Without Constraints

Alina Beygelzimer, Daniel J. Hsu, John Langford, Zhang Tong

We present and analyze an agnostic active learning algorith m that works without keeping a version space. This is unlike all previous approac hes where a restricted set of candidate hypotheses is maintained throughout learn ing, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness a ssociated with maintaining version spaces, yet still allows for substantial im provements over supervised learning for classification.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hasler, Stephan, Fischer, Lydia

Stacked Confusion Reject Plots (SCORE)

arXiv.org Artificial IntelligenceJun-25-2024

Machine learning is more and more applied in critical application areas like health and driver assistance. To minimize the risk of wrong decisions, in such applications it is necessary to consider the certainty of a classification to reject uncertain samples. An established tool for this are reject curves that visualize the trade-off between the number of rejected samples and classification performance metrics. We argue that common reject curves are too abstract and hard to interpret by non-experts. We propose Stacked Confusion Reject Plots (SCORE) that offer a more intuitive understanding of the used data and the classifier's behavior. We present example plots on artificial Gaussian data to document the different options of SCORE and provide the code as a Python package.

classifier, reject option, stacked confusion reject plot, (14 more...)

2406.17346

Country:

North America > United States (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Neural Information Processing SystemsMar-15-2024, 15:22:43 GMT

Agnostic Active Learning Without Constraints Daniel Hsu IBM Research

We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

active learning, algorithm, learning, (15 more...)

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Industry: Information Technology (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Motlagh, Nicholas Kashani, Davis, Jim, Anderson, Tim, Gwinnup, Jeremy

Learning When to Say "I Don't Know"

arXiv.org Artificial IntelligenceFeb-15-2023

We propose a new Reject Option Classification technique to identify and remove regions of uncertainty in the decision space for a given neural classifier and dataset. Such existing formulations employ a learned rejection (remove)/selection (keep) function and require either a known cost for rejecting examples or strong constraints on the accuracy or coverage of the selected examples. We consider an alternative formulation by instead analyzing the complementary reject region and employing a validation set to learn per-class softmax thresholds. The goal is to maximize the accuracy of the selected examples subject to a natural randomness allowance on the rejected examples (rejecting more incorrect than correct predictions). We provide results showing the benefits of the proposed method over na\"ively thresholding calibrated/uncalibrated softmax scores with 2-D points, imagery, and text classification datasets using state-of-the-art pretrained models. Source code is available at https://github.com/osu-cvl/learning-idk.

accuracy, artificial intelligence, machine learning, (18 more...)

2209.04944

Country:

North America > United States > Ohio (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Perini, Lorenzo, Giannuzzi, Daniele, Davis, Jesse

How to Allocate your Label Budget? Choosing between Active Learning and Learning to Reject in Anomaly Detection

arXiv.org Artificial IntelligenceJan-7-2023

Anomaly detection attempts at finding examples that deviate from the expected behaviour. Usually, anomaly detection is tackled from an unsupervised perspective because anomalous labels are rare and difficult to acquire. However, the lack of labels makes the anomaly detector have high uncertainty in some regions, which usually results in poor predictive performance or low user trust in the predictions. One can reduce such uncertainty by collecting specific labels using Active Learning (AL), which targets examples close to the detector's decision boundary. Alternatively, one can increase the user trust by allowing the detector to abstain from making highly uncertain predictions, which is called Learning to Reject (LR). One way to do this is by thresholding the detector's uncertainty based on where its performance is low, which requires labels to be evaluated. Although both AL and LR need labels, they work with different types of labels: AL seeks strategic labels, which are evidently biased, while LR requires i.i.d. labels to evaluate the detector's performance and set the rejection threshold. Because one usually has a unique label budget, deciding how to optimally allocate it is challenging. In this paper, we propose a mixed strategy that, given a budget of labels, decides in multiple rounds whether to use the budget to collect AL labels or LR labels. The strategy is based on a reward function that measures the expected gain when allocating the budget to either side. We evaluate our strategy on 18 benchmark datasets and compare it to some baselines.

data mining, detector, machine learning, (14 more...)

2301.02909

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2022

Reducing Training Sample Memorization in GANs by Training with Memorization Rejection

Bai, Andrew, Hsieh, Cho-Jui, Kan, Wendy, Lin, Hsuan-Tien

Generative adversarial network (GAN) continues to be a popular research direction due to its high generation quality. It is observed that many state-of-the-art GANs generate samples that are more similar to the training set than a holdout testing set from the same distribution, hinting some training samples are implicitly memorized in these models. This memorization behavior is unfavorable in many applications that demand the generated samples to be sufficiently distinct from known samples. Nevertheless, it is unclear whether it is possible to reduce memorization without compromising the generation quality. In this paper, we propose memorization rejection, a training scheme that rejects generated samples that are near-duplicates of training samples during training. Our scheme is simple, generic and can be directly applied to any GAN architecture. Experiments on multiple datasets and GAN models validate that memorization rejection effectively reduces training sample memorization, and in many cases does not sacrifice the generation quality. Code to reproduce the experiment results can be found at $\texttt{https://github.com/jybai/MRGAN}$.

artificial intelligence, machine learning, memorization, (15 more...)